Knowledge-Conscious Exploratory Data Clustering?

نویسندگان

  • Amol Ghoting
  • Srinivasan Parthasarathy
چکیده

We consider the problem of efficiently executing data clustering queries in a client-server setting. Specifically, we consider an environment in which the entire data set is housed on a server and a client is interested in interactively performing kMeans clustering on different subsets of this data set. Extant solutions to this problem suffer from (a) a significant amount of remote I/O and (b) minimal re-use of computation between both iterations of a kMeans query, and executions of different kMeans queries. We propose to facilitate interactive kMeans clustering by employing a client-side knowledge-cache. This knowledgecache is succinct and significantly reduces the amount of remote I/O needed during execution. Furthermore, it permits the re-use of computation, both within and between executions of the kMeans queries. Our experimental study shows that client-side knowledge caching can speed up execution by nearly an order of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting

This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...

متن کامل

Collaborative and Knowledge-based Fuzzy Clustering

Clustering is commonly regarded as a synonym of unsupervised learning aimed at the discovery of structure in highly dimensional data. With a plethora of existing algorithms, the area offers an evident diversity of possible approaches along with their underlying features and potential applications. When augmented by fuzzy sets, fuzzy clustering has become an integral component of Computational I...

متن کامل

A new knowledge-based constrained clustering approach: Theory and application in direct marketing

Clustering has always been an exploratory but critical step in the knowledge discovery process. Often unsupervised, the clustering task received a huge interest when reinforced by different kinds of inputs provided by the user. This paper presents an approach giving the possibility to incorporate business knowledge in order to guide the clustering algorithm. A formalization of the fact that an ...

متن کامل

Amoeba: Hierarchical Clustering Based on Spatial Proximity Using Delaunaty Diagram

Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. We propose an exploratory method that reveals a robust clustering hierarchy. Our approach uses the Delaunay diagram to incorporate spatial proximity. It does not require any prior knowledge about the data set, nor does it require parameters from the user. Multi-level clusters ar...

متن کامل

Co-clustering of biological networks and gene expression data

MOTIVATION Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS We propose constructing a distance function which combines information from expression data and bi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006